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Abstract 

In this paper we describe a new efficient algorithm for recognizing 3D objects by combining photometric 
and geometric invariants. Some photometric properties are derived, that are invariant to the changes 
of illumination and to relative object motion with respect to the camera and/or the lighting source in 
3D space. We argue that conventional color constancy algorithms can not be used in the recognition of 
3D objects. Further we show recognition does not require a full constancy of colors, rather, it only needs 
something that remains unchanged under the varying light conditions and poses of the objects. Combining 
the derived color invariants and the spatial constraints on the object surfaces, we identify corresponding 
positions in the model and the data space coordinates, using centroid invariance of corresponding groups 
of feature positions. Tests are given to show the stability and efficiency of our approach to 3D object 
recognition. 
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1 Introduction 

A typical 

approach to model-based object recognition[14] matches 
stored geometric models against features extracted from 
an image, where the features are typically localized ge¬ 
ometric events, such as vertices. Objects are considered 
to have undergone a transformation in space to yield 
a novel view for the image. To solve for this transfor¬ 
mation explicitly, recognition methods use matches of 
features to hypothesize a transformation, which is used 
to align the model with the image and select the best-fit 
pair of transformation and model. While this approach 
to recognition has achieved considerable success, there 
still remain practical problems to be solved. 

One such problem is the computational complexity of 
the method. For example, even with popular algorithms 
(e.g. [20, 32]), to recognize an object with m features from 
an image with n features, we must examine m 3 n 3 com¬ 
binations of hypotheses where m and n can be easily on 
the order of several hundreds in natural pictures. A sec¬ 
ond problem is the tolerance of the algorithm to scene 
clutter. To verify the hypothesized transformation, ob¬ 
ject recognition algorithms have to collect evidence of 
actual correspondences characterized by that transfor¬ 
mation. This is usually done by looking for nearest im¬ 
age features around the transformed model features, or 
equivalently by casting votes to a hash table of param¬ 
eters, such as affine invariant parameters, leading to a 
correspondence (e.g.[24]). In either case, when features 
are extracted from the image with perturbations, and if 
the image is cluttered so that the feature distribution is 
too dense, it is difficult to tell whether an image feature 
thus detected is the one actually corresponding to the 
model feature or if it just happened to fall close to the 
transformed model feature. This issue has been exten¬ 
sively analyzed, both theoretically and empirically, giv¬ 
ing arguments about the limitations of geometric feature 
based approaches to recognition (e.g.[15, 1, 14]). 

Considering the limitations of conventional ap¬ 
proaches to recognition which depend solely on geomet¬ 
rical features, it is natural to start using other cues 
than simple local geometric features. One such candi¬ 
date is photometric information like color, because we 
know that color often characterizes objects well and it 
is almost invariant to the change of views and lighting 
conditions. In parallel with geometry, color properties 
of the object surface should be a strong key to the per¬ 
ception of the surface. However, most authors who have 
exploited color in recognition used it simply for segmen¬ 
tation, e.g., [5, 29, 16], mostly because color is considered 
to be more contributive in building up salient features 
on the object surface than in giving precise information 
on the location and the poses of the objects. Exceptions 
include Swain[27, 28] and Nayar et. al. [26] who have 
used photometric information more directly for recogni¬ 
tion, respectively for indexing and matching processes. 
At the same time, however, they abandoned the use of 
local geometric features, which still is very useful in pre¬ 
dicting the locations and the poses of the objects. Swain 
used only a color histogram for representing objects and 
matched it over the image to identify the object included 


and localize its presence in the image. Nayar et al. pro¬ 
posed a photometric invariant for matching regions with 
consistent colors given the partitioned model and im¬ 
age derived by some other color properties. Therefore, 
it requires a preliminary segmentation of the image into 
regions having consistent colors. 

In this paper, we attempt to exploit both geometric 
and photometric cues to recognize 3D objects, by com¬ 
bining them more tightly. Our goal is to develop an 
efficient and reliable algorithm for recognition by tak¬ 
ing advantage of the merits of both geometric and color 
cues: the ability of color to generate larger and thus 
more salient features reliably, as well as of adding more 
selectivity to features, which enables more efficient and 
reliable object recognition, and the rich information car¬ 
ried by the set of local geometric features that is useful in 
accurately recovering the transformation that generated 
the image from the model. To realize this, we have devel¬ 
oped new photometric invariants which are suitable for 
this approach. Then, we combine the proposed photo¬ 
metric properties with the Centroid Alignment approach 
of corresponding geometric feature groups in the model 
and the image, that we have recently proposed [25]. This 
strategy gives an efficient and reliable algorithm for rec¬ 
ognizing 3D objects. In our testing, it took only 0.2 
seconds to derive corresponding positions in the model 
and the image for natural pictures. 

2 Some photometric invariants 

In this section, we develop some photometric invariants 
that can be used as strong cues in the recognition of 
3D objects. The invariant is related to the notion of 
color constancy , that is — whether in human or ma¬ 
chine vision — the perceptual ability to determine the 
surface reflectance property of the target objects given 
the reflected light from the object surface in the recep¬ 
tive Held. If a color constancy algorithm could perform 
sufficiently well, we could use it for object recognition be¬ 
cause it would provide a unique property of the object 
itself. Unfortunately, however, color constancy is gener¬ 
ally difficult to compute in practice, so we can not use it 
by itself. The invariant property to be presented here is 
efficiently computed from the segmented/non-segmented 
images at the same time as the geometrical features are 
extracted. 

2.1 Unavailability of color constancy 

Color constancy is an underconstrained problem, as we 
will see in the following. Let S(x, A) be the spectral 
reflectance function of the object surface at x, that is the 
property we have to recover, let A(x, A) be the spectral 
power distribution of the ambient light, and let R^( A) be 
the spectral sensitivity of the fcth sensor, then Pfc(x), the 
scalar response of the fcth sensor channel to be observed, 
is described as 

Pk(x) = j S(x,X)E(x,X)R k (X)d\ (1) 

where, generally, S is a function describing geometric 
and spectral properties of the surface at x that can be 
an arbitrary function and E could also be an arbitrary 



function of x and A. The integral is taken over the visible 
spectrum(usually from 380 to 800 nm). The geometric 
factor of the object surface, that is usually considered to 
include the surface normal and the relative angle of the 
incident and reflecting light direction with respect to the 
surface normal, is very crucial in the 3D world[18]. In 
addition, there are also other confounding factors such as 
specularities and mutual reflections on the surface. With 
these complexities, to perform color constancy, that is to 
recover S(x, A), we need to limit the world to which it is 
applied. To get a simple intuition of this, for example, 
we might insert an arbitrary scalar function C(x) in (1) 
so that we have[33], 

p,(x) = j {S(x, X)C(x)}{E(x, X)/C(x)}R k (X)dX. (2) 

Clearly, when S with E is a solution for (1), S' = SC 
with E' = E/C is also a solution for any function C. 
To turn this into a well-posed problem, almost all au¬ 
thors have addressed problems in a strongly constrained 
world like Mondarian space [19, 13, 33, 31, 12, 9]: a 
2D space composed of several matte patches overlap¬ 
ping each other. Then, based on the observation that 
both the ambient light and the surface reflectance for 
planar surfaces can be approximated by linear combina¬ 
tions of a small number of fixed basis functions[7, 21], 
they can deal with the problem at a fairly feasible 
level[13, 33, 31, 12, 10, 9]. A good mathematical analysis 
is given in [10]. However, all of those results are for a 2D 
world. This two-dimensionality assumption takes away 
any chance of conventional color constancy being used in 
recognizing a 3D world. Therefore, we can not employ 
conventional color constancy algorithms as presented. 

2.2 Some color invariants 

Knowing that color constancy is not easily attainable for 
any plausible 3D world, we propose a photometric invari¬ 
ant property for use in the recognition of 3D objects. 

Since it is known that a spectrum distribution of 
the surface reflectance of many materials depends very 
little on the surface geometry[23], we may break up 
the surface reflectance function into the product of ge¬ 
ometry G(x) and spectrum property L(x, A) such that 
S(x, A) = G(x)L(x, A). Then, the equation (1) becomes: 

p k (x) = j G(x)L(x,X)E(x.,X)R k (X)dX 

= G(x) J L(x, X)E(x, X)R k (X)dX (3) 

[Constant ambient light assumption over the en¬ 
tire surface] 

If we assume that the ambient light spectrum distri¬ 
bution is constant over the entire surface of the objects, 
E becomes simply a function of wavelength A. This as¬ 
sumption is justified when the lighting source is suffi¬ 
ciently far away from the object relative to the size of 
the object surface, and mutual illumination and shadow¬ 
ing are not significant. This yields 

p k (x) = G(x) [ L(x,X)E(X)R k (X)dX (4) 


Taking the ratios between the two i, j channel re¬ 
sponses eliminates the geometric factor G(x) which de¬ 
pends on the relative orientation of the object surface 
with respect to the camera and/or the lighting source, 

Pi{x) _ f L{x,X)E{X)Ri{X)d\ 

Pj (x) ~~ f L(x, X)E(X)Rj(X)d\ 1 j 

By the same reasoning, we have a similar form af¬ 
ter the motion of the object with respect to the camera 
and/or the lighting source, 


PiW = I \)E'(\)Ri(\)d\ 

p'(x') J L'(x',X)E'(X)R j (X)d\ { J 

where primes show the function after the motion, and 
this prime notation applies to any symbol expressing 
some quantity after the motion of the object in the 
rest of this paper unless otherwise described. Note that 
L(x, A) = L'(x', A), because the spectrum property of 
the surface reflectance would not be affected by the ob¬ 
ject motion. When we approximate the spectral ab¬ 
sorption functions R by narrow band filters such that 
Ri(X) Si8(Xi — A), where is the channel sensitivity 
and the A; is the peak of the spectral sensitivity of the 
*th channel, we obtain ratios from (5) and (6): 
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(7) 

( 8 ) 


Since the band width over which a real camera sensor re¬ 
sponds varies from camera to camera, and the standard 
ones may not be too narrow, this is only an approxima¬ 
tion. However, experiments show that this assumption is 
not unrealistic for the normal cameras. Taking the ratio 
of y’s before and after the motion and/or the change of 
lighting conditions yields, 
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e ij 
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( 10 ) 


Since e 8 y is apparently independent of the position 
on the surface, Jij(x.) can be regarded as approximately 
invariant to the changes of illuminant conditions and to 
the motions of the object within a consistent scale factor 
over the object surface. Note that depends only on 
the ratios of spectrum distribution of the ambient light 
before and after the motion of the object. 

In using j for object recognition, we might need to 
normalize its distribution because generally it is invari¬ 
ant only within a scale factor. When we are provided 
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with the sets of 7 from corresponding positions over dif¬ 
ferent views, this could be done by applying a normal¬ 
ization process to the original sets: 

_ 1_ 

7ij = (T ij 2 7ij (n) 

where Cjj is the variance of the given 7 ij distribution. 
Note that when the ambient light has not been changed, 
eij = 1 , so that 7 ij(x) = 7 b (x'), thus normalization 
process is not needed. 

[Only locally constant ambient light assumption] 

Now, let us assume only a locally constant ambient 
light spectrum distribution, instead of the globally con¬ 
stant one over the object surface: ff(x;,A) = E(x m , A) 
for nearby positions x;,x m . Then, eqs. (7) and ( 8 ) must 
be modified respectively as: 


Jij(x) = 
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Incorporating the assumption, that is, f7(x/,A) = 
E(x m , A), and B'(x'j,A) = E' (id m ,\), we again have 
an invariant ip;T '■ 
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thus, apparently, t/>[™ ps iplj 1 ' ■ However, 1 pfj 1 is obviously 
sensitive to perturbations contained in the image signals 
especially when one makes the values of 7 ij(x m ) (the 
denominator in (14)) close to zero. To stabilize this, we 
adopt a normalized measure in place of ip itself: 


,Jm — 

dij = 


Jij (Xf) 


Jij ( X m ) + Jij (Xf) 


(15) 


It is easy to see ip x, <p', that is, ip is approximately in¬ 
variant to the change of illumination conditions and of 
orientations of the object surfaces. 

Note that for 7 ij we can not derive this kind of nor¬ 
malized invariant formula. A very important thing to 
remember here is that in order to make <p useful, the 
surface reflectance properties associated with two nearby 
positions x;, x m to be picked up must be sufficiently dif¬ 
ferent from each other. Otherwise, even if an invariant 
of p in (15) holds true, as the 7 ’s tend to have the same 
value for x;,x m , the ip’s always return values that are 
close to 0.5, so that it does not provide any useful infor¬ 
mation involved in their color properties. Fortunately, 
as we describe later when color properties are picked up 
from different sides of the brightness boundaries, this 
situation may often be avoided. 


2.3 Related photometric invariants 

A related invariant to our photometric invariants was 
proposed earlier based on an opponent color model by 
Faugeras for image processing applications^]. The op¬ 
ponent color model was first introduced by Hering[17] to 
describe the mechanism of human color sensation. He 
advocated that the three pairs Red-Green, Blue-Yellow, 
White-Black form the basis of human color perception. 
A simple mathematical formulation of this[3], which is 
a linear transformation of R, G, B was used as a color 
invariant in [27, 28] for indexing 3D objects: [R-G,B1- 
Y,W-Bk] T = L[R,G,B] t , where L is a linear transfor¬ 
mation. A similar formalization of an opponent color 
model was also used for the correspondence process in 
color stereopsis [5]. However, there are no theoretical ex¬ 
planations of the linear transformation model for the full 
3D object surfaces, because, as we noted in the deriva¬ 
tion of our invariants, the surface orientation in 3D space 
with respect to the lighting source and the camera is an 
unignorable factor (see also [18]) in deriving invariants 
for a 3D world, and it is never removed by any linear 
transformation. 

Unlike this linear transformation case, Faugeras’s 
form is the logarithm of the ratios between different 
channel responses for a chromatic model, so is similar 
to ours, and the logarithm of the products of three of 
R, G, B responses but with a low-pass filtering account¬ 
ing for lateral inhibition for achromatic responses. 

In [4] a unique illuminant-invariant was proposed 
which, assuming the existence of at least four local dis¬ 
tinct color surfaces, uses the volumetric ratio invariant 
of the parallelepiped generated by the responses of the 
three receptors. It seems to us, however, that the as¬ 
sumption of four local distinct color surfaces is demand¬ 
ing too much in practice. 

Recently, a new photometric invariant was proposed 
for object recognition[26]. Limiting its application to 
only geometrically continuous smooth surfaces, it used 
as an invariant the ratio between the brightnesses of two 
adjacent regions each with consistent and different sur¬ 
face spectral reflectance. Therefore, it requires a prelim¬ 
inary complete segmentation of the image into regions 
having the same colors. Other assumptions introduced 
in its derivation are almost the same to ours (locally con¬ 
stant ambient illuminant case) except for the additional 
continuous smooth surface constraint over the boundary 
of two surfaces with different spectral reflectance. 

2.4 Experiments 

Experiments were conducted to examine the accuracy 
of the proposed photometric invariants. Figure 1 shows 
pictures of a man-made convex polyhedron composed 
of 6 planar surfaces each with a different surface ori¬ 
entation. The left picture is a front view of the poly¬ 
hedron, hereafter pose Pa, while in the right picture 
the object is rotated around the vertical axis (y-axis) 
by about 30 degrees, hereafter pose Pg. On each side of 
the boundary of adjacent surfaces, several matte patches 
with different colors were pasted. Then, we picked up 
corresponding positions manually within each colored 
patch in the pictures for the poses ( Pa,Pb )• The se- 



lected positions within patches are depicted by crosses 
in the pictures. To test the accuracy of the proposed 
invariants 7 , p under varying illuminant conditions and 
surface orientations of the object with respect to the il¬ 
luminant and the camera, we took three pictures: the 
first at the pose Pa under the usual lighting conditions 
(Pa&Lu), the second at the pose Pg under a green¬ 
ish light (Pb&iLg), and the third at the pose Pg but 
under a bluish light (Pb&iLb)- To change the source 
light spectrum, i.e., to get greenish or bluish light, we 
covered a tungsten halogen lamp with cellophane of col¬ 
ors green and blue. For p, the surface positions within 
planar patches facing over the boundaries of planar sur¬ 
faces were used as neighboring positions to satisfy the 
requirement of (locally) constant ambient light. To 
compute the invariants in practice, we used the ratios 
G/R, B/R for 7 and pt = (G 1 /R 1 )/(G 1 /R 1 + G 2 /R 2 ), 
ip 2 = (B 1 /R 1 )/(B 1 /R 1 + B 2 /R 2 ) for p, where R,G,B 
are the outputs from the sensor channels respectively of 
Red, Green, Blue, and the indices attached to R, G, B 
shows the sides of the surfaces used for computing ip’s 
with respect to their boundaries. As described previ¬ 
ously, in theory, when we use the RGB channel outputs 
to compute invariants, instead of outputs through the 
exact narrow band filters, they might be only pseudo¬ 
invariants. But, the following results confirm that the 
values of 7 and p computed using RGB are fairly in¬ 
variant to the changes of the illumination conditions as 
well as the surface orientations. In Table 1, the cor¬ 
relation coefficients between the sets of values for each 
invariant measure computed at corresponding positions 
in different pictures are given, that are measured by the 
following formula: 


C 2 , 


GaaGa' a' 


(! 6 ) 


where Cat’s (a, b £ {a, a'}) are the covariances between 
the sets of the values of the measure a (e.g., 7 ) before (a) 
and after (a r ) the motion of the objects or the changes 
of the lighting conditions, which is defined by: 

C ai = ^P(a, 6 )(a-a)( 6 - 6 ) (17) 


where x is the average of the measure x, P(a,b) is the 
probability density function, and the sum is taken over 
all corresponding values of the measures a, b. A high 
correlation, that gives a value close to 1 , shows that 
the proposed invariant measures remained unchanged 
within a consistent scale over the set of positions be¬ 
tween the two picture, while a low correlation, that gives 
a value close to 0 , means that the values of the mea¬ 
sures changed in a irregular manner. For comparison, 
other color properties including raw (R, G, B), ( H, S, V) 
(hue,saturation,value), and a linear-transformation im¬ 
plementation of the opponent color model[3] are also 
included. In these tests, R,G, B, R — G, B — Y , 7 = 
G/R, B/R , are almost equally good, though 7 is the best 
among them on average, that mean those properties have 
been changed but only within a consistent scale between 
the different pictures (recall the property of 7 being in¬ 
variant within a scale factor). The reason why R is very 
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good is probably just that we did not happen to change 
the intensity of the red light spectrum. The values of 
H, S, V is unexpectedly quite unstable. The measure p 
is extremely stable. To see how far the color properties 
remained unchanged in addition to the correlative rela¬ 
tion, in Figure 2 the actual distribution of the color prop¬ 
erties are shown, where the horizontal axes are the values 
for the pose Pa, while the vertical axes are those for the 
pose Pg. If the color measures remained unchanged be¬ 
tween the two pictures before and after the motions of 
the object and/or the changes of the light conditions, 
the distributions should present linear shapes, and their 
slopes should be close to 1. Indeed, the measure p is 
certainly found to remain almost unchanged under vary¬ 
ing light conditions, while other color properties H, S, 
and 7 = G/R, B/R included for comparison are found 
not. The biases of the slopes of 7 either toward the hor¬ 
izontal or vertical axes indicate that the light spectrum 
has been changed between the two compared pictures. 
Figure 3 shows the performance of 7 constancy against 
the change of the object pose, under the same lighting 
conditions. In other words, unlike in the last experi¬ 
ments, this time the ambient light has not been changed 
for both of the two pictures, and only the object pose 
has been changed. For comparison, the performance of 
B — Y (linear-trans implementation for blue vs. yellow, 
the second figure from the left) as well as raw B (blue, 
the first one) are also shown. Note that what should 
be observed here is how the slopes of the distributions 
are close to 1. Except for the two samples in the upper 
area in the figure (the fourth picture), 7 = B/R is found 
to be almost unchanged between the two pictures. The 
two exceptional samples were from patches with almost 
saturated blue channel in the picture at pose Pg. The 
performance of 7 = G/R (the third figure) is almost per¬ 
fect. On the other hand, B — Y and B are perturbed 
around the slope of 1 , which is probably caused by the 
perturbed orientations of the patches. This suggests that 
7 may be used for object recognition without applying 
any normalization process, so that extracting object re¬ 
gions might not be a prerequisite, as long as the lighting 
conditions are not changed. 

Similarly, in Table 2 the results of the same tests as 
above but on a natural object, a doll which is shown 
in Figure 4, are given, for which both the ambient light 
and the object pose were changed. We refer to the pose 
of the doll similarly to the above tests on the Test- 
Object: left pose Pa, right pose Pg. The first pic¬ 
ture was taken under a usual lighting conditions from 
the oblique angle (Pa^Ljj), the second and third were 
taken respectively under a greenish and a bluish light 
from the front angle(T’ b&Tg', PglkLg). Correspond¬ 
ing positions were picked up manually as done in the 
previous tests. As the surface colors varied smoothly, 
we can not expect that we could pick up correspond¬ 
ing points accurately. Thus, unwanted errors could be 
introduced in this operation. This time for p, two posi¬ 
tions which are closest to each other among the selected 
points are used. In this tests, R, G, B and S, V per¬ 
formed poorly, though somehow H was very good. The 
linear model R — G, B — Y and 7 = G/R, B/R performed 



well again, though 7 was better. The measure ip is quite 
stable again. Unlike the results on the Test-Object, how¬ 
ever, since the surface of the doll, especially in the body 
parts, had similar surface colors in near positions, the 
distribution of p — pi = (G 1 /R 1 )/(G 1 /R 1 + G 2 /R 2 ), 
p >2 = (B 1 /R 1 )/(B 1 /R 1 + B 2 /R 2 ) — did not spread very 
well, thus having a weak selectivity photometrically, as 
seen in Figure 5. Therefore, when picking up two nearby 
positions for <p for object recognition, it is important that 
they have different spectral reflectance. For comparison, 
the values of H, S, and 7 are also plotted in Figure 5. 

2.5 Sensing limitations 

As we note in the examination above, the invariant prop¬ 
erties are sometimes perturbed around the ideal values 
which support our theories. This is caused mainly by 
the limited dynamic range of the sensors of the cam¬ 
era. These effects include Color Clipping and Blooming 
as argued carefully in [23]. When the incident light is 
too strong and exceeds the dynamic range of the sen¬ 
sor, the sensor can not respond to that much input and 
thus clips the upper level beyond the range. This means 
the sensor does not correctly reflect the intensity of the 
light any more. Note that this is very serious for our in¬ 
variants, because both 7 and <p are ratio invariant, and 
a basis of their theory is, whether locally or globally, 
the consistency of the amount of light falling onto the 
concerning positions on the object surfaces. Here, our 
natural and important assumption is that this consis¬ 
tency is correctly reflected in the responses of the sen¬ 
sors. Therefore, if the sensor response does not meet 
this assumption, our theory no longer holds. The same 
arguments also hold for the blooming effect. When the 
incoming light is too strong to be received by the sensor 
element of the CCD camera, the overloaded charge will 
travel to the nearby pixels, thus crippling the responses 
of such pixels. 

3 Combining photometric and 

geometric constraints for 3D object 
recognition 

In this section, we describe how we can exploit the pho¬ 
tometric invariant developed in the preceding section for 
recognizing 3D objects. The basic idea is to combine it 
with the Centroid Alignment approach we have recently 
proposed in [25]. 

3.1 Centroid invariant of geometric feature 
groups 

We argued in [25] that when an object undergoes a lin¬ 
ear transformation caused by its motion, the centroid of 
a group of 3D surface points is transformed by the same 
linear transformation. Thus, it was shown that under an 
orthographic projection model, centroids of 2D image ge¬ 
ometric features always correspond over different views 
regardless of the pose of the object in space. This is true 
for any object surfaces (without self-occlusion). Note 
that this property is very useful, because if we have some 
way to obtain corresponding feature groups over different 
views, we can replace simple local features used for defin¬ 


ing alignment in conventional methods by those groups, 
thereby reducing computational cost. We demonstrated 
the effectiveness of this approach to object recognition 
on natural as well as simulation data [25]. 

3.2 Grouping by photometric and geometric 
constraints 

To obtain corresponding groups of 2D geometric fea¬ 
tures, we can use the proposed photometric invariant 
measures associated with each feature. 

In [25], to obtain corresponding geometric feature 
groups, a clustering operation, in which the criterion 
was rotationally invariant, was applied in the coordinates 
which had been normalized up to a rotation prior to a 
clustering. This time, we again use a clustering tech¬ 
nique to obtain corresponding geometric feature groups 
in different views. Our intention is to yield correspond¬ 
ing cluster configurations using a criterion incorporat¬ 
ing spatial proximity constraints of geometric features 
and the invariance of their associated photometric in¬ 
variants we have proposed. Therefore, we assume that 
surface colors (surface spectral reflectance) vary mostly 
from place to place. In other words, within some local 
areas surface colors are almost consistent. Note that this 
assumption should be justified for most object surfaces, 
because otherwise we must always be seeing diffused col¬ 
ors over the surface and thus always having difficulty in 
trying to distinguish surfaces. We also normalize the ge¬ 
ometric feature distributions by the linear transforma¬ 
tion we presented in [25]. This transformation has been 
confirmed, both mathematically and empirically, to gen¬ 
erate a unique distribution up to a rotation, for feature 
sets from a planar surface on the object, regardless of 
the surface orientations in 3D space. We note that even 
3D object surfaces often tend to become planar in their 
visible surfaces, thus justifying the use of our transfor¬ 
mation for 3D object surface. This will be seen later in 
the experiments. 

3.3 Implementation 

We employ the Kmean clustering algorithm, in which the 
criterion is rotationally invariant, to obtain correspond¬ 
ing feature groups in the feature set from different views. 
The feature vector / used in clustering is the extended 
feature (from local geometrical feature) which is defined 
by the following vector: 

/=[/J,s/J] T ( 18 ) 

where f g is the 2D geometric feature composed of spa¬ 
tial coordinates f g = (x,y) T of a feature point in the 
xy image plane, and f p is the vector of photometric in¬ 
variant properties we proposed in the preceding sections, 
and s is a balancing parameter. Note that what we ul¬ 
timately need here is simply the configuration of geo¬ 
metric features, that is f g , in the clustering results, and 
photometric invariant is used only as a cue in performing 
clustering. 

After the clustering, an alignment process starts by 
using centroids of clusters so derived to recover the trans¬ 
formation which generated a novel view, the image data, 
from the model. It is known that only 3 point corre¬ 
spondences suffice to recover the transformation either 



by using Linear Combination of the models[32] or a full 
3D object model[20]. Therefore, we examine every pos¬ 
sible combination of triples of cluster centroids of model 
and data that are generated by clustering, and select the 
best-fit transformation to generate the data from the 
model in terms of their match. In our testing, which 
we will see later, this number of clusters could be sup¬ 
pressed to less than 10. Further, we should note that 
we only need to consider the combination of model and 
data cluster centroids which have compatible values of 
7 or ip. This means that adding photometric properties 
contribute not only to the clustering but also to the selec¬ 
tivity of the features (cluster centroids). Therefore, con¬ 
sidering the computational complexity of conventional 
alignment approach to recognition, this should bring a 
noticeable computational improvement. 

4 Empirical results 

In this section, we show experimental results of our algo¬ 
rithm for identifying corresponding positions in different 
views. Tests were conducted on natural pictures includ¬ 
ing 3D objects to be recognized. 

4.1 Preliminaries 

Geometric features used for our algorithm can be ex¬ 
tracted as follows: 

(Step 1) Use an edge detector[ 6 ] after preliminary 
smoothing to obtain edge points from the original gray 
level images. 

(Step 2) Link individual edge points to form edge curve 
contours. 

(Step 3) Using local curvatures along the contours, iden¬ 
tify features as corners and inflection points respectively 
by detecting high curvature points and zero crossings 
based on the method described in [20]. Before actually 
detecting such features, we smooth the curvatures along 
the curves [ 2 ]. 

In obtaining color attributes from corresponding posi¬ 
tions we should note that the positions of the geometric 
features thus extracted in different views do not always 
correspond exactly in discrete image coordinate space. 
This is not only due to quantization error, but also be¬ 
cause edges detected to derive feature points can shift to 
the other side of the surface beyond the boundary under 
a object rotation within a image plane. Note that this 
is serious because the occurrences of gray level edges of¬ 
ten tend to coincide with color edges[5]. So, we can not 
simply use the color attributes of the geometrical feature 
points derived from gray level edges. To solve this prob¬ 
lem, we picked up color values from two positions over 
the gray level boundary, which are away from the geo¬ 
metric feature positions in the opposite directions along 
the local normals of the contours. Then, we used two 
color values from both of two positions. As we do not 
know which sides of an edge in one picture correspond to 
which in another, the distance metric between the pho¬ 
tometric invariant vectors associated to two different fea¬ 
ture positions should be independent of the correspon¬ 
dences of those sides of the surfaces. Thus, the actual 
measure used for photometric invariant vector f p and the 
distance metric for two of those (that are used for com¬ 


puting the values for clustering criterion) are designed 
such that they support the symmetry on the sides of the 
surfaces over the boundaries: f p = [f p T ,f p T ] T , where 
P p = {G i /R, B i /W ) for 7 and f l p = ((GpR 1 )/(G i /R i + 
G j /R j ),(G j /R j )/(G i /R i + G j /R j ),(B i /R i )/(B i /R i + 
Bi/Ri), (Bi /f? J ')/(5 8 ' / R i + B j /R?)) for <p , and indices 
(i,j) G {( 1 , 2 ), ( 2 , 1 )} show the sides of the surfaces with 
respect to their boundaries, and the distance metric be¬ 
tween f p i and f p 2 for geometric feature positions 1 , 2 
is: 

\f P i ~ f P 2 I 2 = min{\\f^ - /p 2 1| 2 + 11 f p \ - fp 2 1 | 2 , 

Il/ P 1 i-/ P 2 2 l| 2 +Il/ P 2 i-/ P 1 2 l| 2 } (19) 

where || • || denotes Euclidean distance. This appar¬ 
ently supports the symmetry on the sides of the surfaces 
over the boundaries of the gray level, and is invariant to 
the rotation of the objects within a image plane. The 
following experiments test our algorithm with both of 
the proposed invariants 7 , ip. For each feature position, 
the associated invariant ip was computed using color at¬ 
tributes of those two points mentioned above, that is, 
two points a little away from the geometrical feature 
points along the contour normals in the opposite direc¬ 
tions. As described earlier, since gray level edges tend to 
coincide with color edges, the color values collected from 
those two positions facing across the gray level edges 
are usually quite different, thereby producing ip distri¬ 
butions that spread over the feature space. To satisfy 
the requirement for 7 , that is to be provided with the 
corresponding sets of points between the model and the 
data views, the object regions were extracted prior to 
the application of our algorithm. This was done manu¬ 
ally though we expect that this could be done automat¬ 
ically using several cues such as motion, color, texture, 
(see e.g.,[30, 29, 27, 28].) Note that, however, in using 
ip this process, i.e., region extraction, is not necessarily 
required, as long as the background in the picture hap¬ 
pened to have different colors from object ones. This is 
because ip is a complete invariant, unlike 7 which needs 
further normalization to remove scale factors as we have 
argued. This is also true for 7 when the ambient light 
has not been changed before and after the motion of the 
objects. Hereafter, we refer to 7 , the normalized mea¬ 
sure, as simply 7 . 

4.2 Experiments 

We tested our algorithm to see how accurately it can 
identify corresponding positions over different pictures 
taken under varying light conditions and poses of the 
objects to be recognized. It would not be hard to 
see that identifying corresponding positions perfectly is 
not an easy task, because in doing that we must fight 
against two different kind of instabilities: one in ex¬ 
tracting geometric features, most serious one of which 
is the missing of features, and the other substantially 
contained in photometric properties of the image, such 
as the ones described in the arguments for sensing limi¬ 
tations. Remember that, however, for our ultimate ob¬ 
jective, that is recognizing objects using the identified 
positions, only three correspondences are sufficient un- 



der orthographic projection model[32] or weak perspec¬ 
tive projection model[20]. Therefore, what have to be 
observed in the following results are whether our algo¬ 
rithm could identify at least this minimum number of 
correspondences or not. First, the results of using 7 as 
photometric invariant are shown. 

[With 7 for photometric invariant] 

Figure 6 shows the results of obtaining feature group cen¬ 
troids on Band-Aid-Box pictures, which includes char¬ 
acters of some different colors on a white base on the 
surface. All the pictures were taken to involve the same 
three surfaces of the box, which are to be used for the 
recognition. The figures in the first row from the top 
show the edge maps with extracted geometric features 
superimposed on them with small closed circles. The 
first from the left (hereafter first) picture was taken 
under a usual light conditions. The second from the 
left (hereafter second) and third from the left (hereafter 
third) pictures were taken respectively under a greenish 
and a bluish light at a different pose from the first one. 
Throughout the rest of the paper, we refer to the figures 
by the order they are presented from the left as above. 
The lighting conditions were changed by the same way 
used in the experiments presented in section 2.4. The fig¬ 
ures in the second and the third rows show the respective 
original and normalized distributions of 7 . The horizon¬ 
tal axes of the figures are for G/R while the vertical axes 
are for B / R. These figures show how the invariant prop¬ 
erty 7 remained unchanged between the different pic¬ 
tures. When it performs well, the original distributions 
of 7 should show the similar shape over different views 
except for some scale change along the axes. Then, those 
scale distortion (e.g., dilation) should be corrected by 
the normalization of the distribution, thus ideally show¬ 
ing linear distributions of slop 1. Note that even if the 
shape of the distributions are distorted in addition to the 
dilation, we can not conclude that the proposed invari¬ 
ants performed poorly. This is because unstable results 
of the geometrical feature extraction will also distort the 
shape of the distribution of the photometric properties. 
The intermediate results of clustering are shown in the 
fourth row in their normalized coordinate of the geo¬ 
metric features. In the figures of the first row, identified 
corresponding positions using our algorithm are super¬ 
imposed by large closed circles. Therein, the accuracy 
of our algorithm are found to be fairly good. Appar¬ 
ently perturbations of identified positions were caused 
partly by the unstable results of feature extraction, e.g., 
missing features, rather than by clustering errors or in¬ 
completeness of the proposed photometric invariant. 

In Figures 7 results on Spaghetti-Box pictures taken 
in the same way as the Band-Aid-Box pictures are given. 
The surfaces of this box include some textures including 
large/small characters. This is a little cluttered texture 
compared with the Band-Aid-Box surface. The first row 
shows the edges with extracted geometric features su¬ 
perimposed on them. The first picture was taken un¬ 
der a usual light condition. The second and the third 
pictures were taken respectively under a greenish and a 
bluish light at different poses. The second and the third 


row figures show the respective original and normalized 
distribution of 7 . The algorithm could perform identi¬ 
fication of the corresponding positions fairly accurately 
as we see in the top figures. 

Similarly, in Figure 8 the results on Doll (the same one 
as the one used in the section 2.4) pictures are presented. 
Unlike the last two example, the surface of this doll does 
not have man-made texture such as characters, but only 
has color/brightness changes partly due to the changes 
of materials and partly due to depth variations. The 
surface is mostly smooth except for some parts includ¬ 
ing hair, face, and finger parts. The pictures in the first 
row show the edges with extracted geometric features 
superimposed on them. The first and second pictures 
were taken under a usual light conditions, but at differ¬ 
ent poses of the doll. The third picture was taken under 
a moderate greenish light plus usual room light. For 
the fourth picture, we used an extremely strong tung¬ 
sten halogen lamp with a bluish cellophane covering it. 
The second and the third row figures show the respective 
original and normalized distributions of 7 . Comparing 
the shapes of original and normalized distributions of 7 
for the first and the second pictures, we can confirm that 
when the light conditions have not been changed the dis¬ 
tributions of 7 are not affected by the change of pose of 
the object. The algorithm could perform identification 
of the corresponding positions fairly accurately as we see 
in the pictures. 

[With ip for photometric invariant] 

The results of using ip as a photometric invariant on the 
same pictures used for 7 are shown. Figure 9 presents the 
results on Band-Aid-Box pictures. The first row shows 
the edge maps with extracted geometric features super¬ 
imposed on them with closed circles. In the second row, 
respective distributions of <p are shown. The horizon¬ 
tal axes are for (G l /R 1 )/(G l /R l + G J /R?), while the 
vertical axes are for (B l /R l )/(B l /R l + B J /W) where 
(i,j) G {(1,2), (2, 1)}. As described already, since we do 
not know the correspondences of the sides of the sur¬ 
face over the edges (contours), we included properties 
from both sides of the edges. Consequently, we had 2- 
fold symmetric distributions of tp around its centroid as 
noted in the second row figures (see eq. (15)). When 
ip performs well as an invariant, this distribution should 
remain unchanged over different pictures. Thus, the sec¬ 
ond row figures demonstrate a fairly good performance 
of it for this picture. The intermediate results of clus¬ 
tering are given in the third row figures in their normal¬ 
ized coordinate of the geometric features. In the figures 
of the first row, identified corresponding positions using 
our algorithm are also superimposed by large closed cir¬ 
cles. Thus, the accuracy of our algorithm are found to 
be fairly good. 

In Figures 10 the results with <p on Spaghetti-Box are 
given. The first row shows the extracted geometric fea¬ 
tures. The second row shows the distributions of <p. The 
performance of <p is almost perfect. As we see in the pic¬ 
tures, the algorithm with <p could perform identification 
of the corresponding positions very well. 

Figure 11 presents the results on Doll pictures. In 



the first row, the edge maps with extracted geometric 
features superimposed on them are shown. The second 
row shows the the respective distributions of <p. Since 
for the fourth picture we used extremely intensive blue 
light, the blue channel of many pixels were saturated. As 
a consequence, the distribution of ip was shrunk in the 
vertical direction as noted in the fourth picture of the 
second row. For these doll pictures, generally, the results 
of identifying corresponding positions with ip were not 
as good as those with 7 , though not very bad. This is 
probably because as the surface colors of the doll varies 
quite smoothly in most parts, the distribution of <p did 
not spread well, so that it did not work so well as to 
separate clusters in terms of colors. 

5 Discussions and conclusion 

We argued that by combining the proposed photometric 
invariants with geometric constraints tightly, we can re¬ 
alize very efficient and reliable recognition of 3D objects. 
Specifically, we conducted the experiments of identify¬ 
ing the corresponding feature positions over the differ¬ 
ent views taken under different conditions. Although we 
did not include the demonstrations of the actual recog¬ 
nition process, as described, by connecting the presented 
method for identifying features using photometric invari¬ 
ants with the popular recognition algorithms, such as 
the full 3D model method[20] or the Linear Combina¬ 
tion of the model[32], we can perform object recognition 
quite efficiently. This may be demonstrated somewhere. 
In the experiments, we showed that our methods could 
tolerate perturbations both in color and geometric prop¬ 
erties, and could provide at least minimum number of 
correspondences of positions necessary for object recog¬ 
nitions. Although we extracted the object regions man¬ 
ually in the experiments this is sometimes easily done 
from sequences of images, from the simple background, 
or may be performed by using color segmentations. In 
addition, we stress again that as long as the background 
has different colors from the object ones, we can use 
ip without any preliminary processing for region extrac¬ 
tion. This also holds true for 7 when the ambient light 
has remained unchanged. The weakness of <p comes out 
when the discontinuities of gray level do not coincide 
with the ones of colors. In this case, the distribution of 
ip does not spread very well. This emerged in the body 
parts of the doll. Compared with the conventional ap¬ 
proaches of matching local features of which the number 
is of the order of several hundreds, the computational 
cost of our approach for recognizing 3D objects should 
be very small. The time for identifying (about 10) cor¬ 
responding feature positions, i.e., cluster centroids, was 
around 0.2 sec for pictures with several hundreds fea¬ 
tures. In addition, we can use the invariant photometric 
values in searching for the correspondences between the 
derived feature points in the model and the image, so 
that needless searches could be further suppressed. 

The advantages of our approach compared with Na- 
yar’s are as follows. Their method uses invariant photo¬ 
metric properties derived for regions each with a consis¬ 
tent and different color, so that the color segmentation 
is a prerequisite. In our view, this color segmentation 


is an essential process to reduce the size of the search 
space for correspondences, and the photometric invari¬ 
ant was used only for further limiting possible matches 
between the model and the data regions. Unfortunately, 
however, achieving complete color segmentation is often 
quite hard and time consuming[29]. Of course, it can 
still contribute to reduce the computational cost, since 
in general the number of color regions included in the en¬ 
tire image could still be on the order of some tens. But, 
it appears to be less of a contribution than color segmen¬ 
tation to the reduction of computational cost. Contrary 
to their approach, since our photometric invariant can be 
computed only locally, we do not necessarily need color 
segmentation as mentioned above, so is less demanding. 
In addition, since the color properties are passed to the 
following clustering plus feature centroid alignment pro¬ 
cess, our method can tolerate many confounding fac¬ 
tors, such as inaccuracies of region and/or feature ex¬ 
traction, happening in the application to the real world. 
The clustering plus feature centroid alignment process is 
very suitable for compensating those uncertainties. We 
should also point out that, to be theoretical, region cen¬ 
troids which they used for matching can not be used for 
3D surfaces, while our feature centroids can. 

An alternative way of using the proposed photomet¬ 
ric invariant in recognition is just to incorporate it into 
the conventional framework of recognition. For exam¬ 
ple, in selecting features to form hypothesized corre¬ 
sponding triples of features between the mode and the 
data, photometric properties can be used to limit the 
possible matches between the model and the data fea¬ 
tures, trimming a bunch of needless combinations in the 
search space, thereby effectively reducing the computa¬ 
tional cost. This kind of idea has been used in [26] for 
matching corresponding regions. 
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Figure 1: Tests of Invariant on Convex Polyhedron 

The pictures show the convex polyhedron in different poses: left pose Pa, right pose Pb- This object is composed of 6 planar 
surface patches each with different surface orientation. On each side of the boundary of adjacent surfaces, several matte 
patches with different colors were pasted. Then, we picked up corresponding positions manually within each colored patch in 
both pictures. The selected positions within patches are depicted by crosses. 



PaY-Lu-PbY-Lg 

Pa fc L u~Pb <YLb 

R 

0.988368 

0.989877 

G 

0.967951 

0.974081 

B 

0.946251 

0.882816 

H 

0.724681 

0.701377 

S 

0.914236 

0.749529 

V 

0.945473 

0.668672 

R-G 

0.985398 

0.985687 

B - Y 

0.935039 

0.908867 

G/R 

0.978163 

0.988289 

57 R 

0.962186 

0.907126 

Pi = (G'/R^HG'/R 1 + G*/R Z ) 

0.997766 

0.997532 

p 2 = (B 1 ! R 1 )I(B 1 ! R 1 +B' Z /R Z ) 

0.991843 

0.988893 


Table 1: Correlation coefficients between the sets of the values of the color properties from different pictures of 
Test-Object. 

The correlation coefficients between the sets of values of the proposed invariants from pictures taken under different light 
conditions and at the different poses of the object are given to show how much they remain unchanged within a consistent 
scale. For comparison, other color properties including (R,G, B), ( H,S,V j, and a linear-trans implementation of opponent 
color model[3] are also presented. In these tests, (R,G, B), (R — G, B — Y), 7 = (G / R, B/R), are almost equally good, though 
7 is best among them. The reason why R is also fine is probably just that we did not happen to change the intensity of the red 
light spectrum. The values of (H,S, V) (hue,saturation,value) is unexpectedly unstable. The measure p is extremely stable. 
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Figure 2: Distributions of invariants on Convex Polyhedron 
The left two columns are from pictures taken under Pa&iLu (horizontal axis) and Pb&iLcj (vertical axis), and the right two 
columns are from pictures under Pa&zLu (horizontal axis) and Pb&Lb (vertical axis). The rows in each two columns are 
respectively: top left and right: H and S , middle left and right: G/R and B/R, bottom left and right: 

¥>1 = (G 1 /R 1 )/(G 1 /R 1 +G 2 /R 2 ) and = (B 1 /R 1 )/(B 1 /R 1 +B 2 /R 2 ). 
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Figure 3: Tests of 7 at different poses of object but under the same illuminant conditions 
The first from the left: distribution of Blue, the second : B — Y (Blue vs. Yellow), the third: G/R, the fourth: B/R. The 
horizontal axis is for the pose Pa and the vertical axis is for the pose Pb- Except for the two samples in the upper right area 
of the distribution, 7 = B/R is found to be almost unchanged in both of the pictures because the slope is almost 1, while 
B — Y and B are perturbed around the slope of 1. Those two exceptional samples were from patches with almost saturated 
blue channel in the picture at pose Pb . The distribution of 7 = G/R is almost perfect. This gives the evidence that 7 may 
be used for object recognition without applying any normalization process, so that extracting object regions might not be a 
prerequisite, as long as the lighting conditions are not changed. 



Figure 4: Tests of Invariant on natural pictures 

The pictures show a doll at different poses: left pose A, right pose B. We picked up corresponding positions in both views. 
The selected positions are depicted by crosses. 
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Pa&zLu-Pb&zLg 

Pa&Lu-Pb&-Lb 

R 

0.764343 

0.819267 

G 

0.588161 

0.881416 

B 

0.936572 

0.843604 

H 

0.951843 

0.923887 

S 

0.934587 

0.490994 

V 

0.398850 

0.459425 

R-G 

0.764240 

0.939152 

B -Y 

0.948642 

0.877519 

G/R 

0.779377 

0.944164 

57 R 

0.962186 

0.895180 

<Pi = (G 1 / R 1 )/(G 1 / R 1 +G 2 /R 2 ) 

0.996245 

0.998781 

p 2 = (B 1 ! R r )l(B i ! R 1 +B 2 /R 2 ) 

0.988840 

0.983675 


Table 2: Correlation coefficients between the sets of the values of the color properties from different pictures of the 

Doll. 

The results on natural object, a doll, are given. The first picture was taken under a usual lighting conditions from the oblique 
angle ( Pa & it/), the second and third were taken respectively under a greenish and a bluish light from the front angle(is&iG, 
Psfcis). This time for p — p! = (G 1 /R 1 ) / (G 1 / R 1 + G 2 /R 2 ), p 2 = (B 1 / R 1 )/(B 1 / R 1 + B 2 /R 2 ) — two positions which 
are closest to each other are used. In this tests, R, G, B and H, S, V were very unstable. The linear model R — G, B — Y, 
7 = G/R, B/R did perform well again, though 7 was better. The measure ip is quite stable again. 
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Figure 5: The distributions of invariant measures on Doll pictures. 

The left two columns are from pictures taken under Pa&zLu (horizontal axis) and Pb&iLcj (vertical axis), and the right two 
columns are from pictures under Pa&zLu (horizontal axis) and Pb&iL_ g, (vertical axis). The rows in each two columns are 
respectively: top left and right: H and S , middle left and right: G/R and B/R, bottom left and right: 

¥>1 = (G 1 /R 1 )/(G 1 /R 1 +G 2 /R 2 ) and = (B 1 /R 1 )/(B 1 /R 1 +B 2 /R 2 ). 
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Figure 6: Tests with it on Band-Aid-Box picture. 

Edge maps are shown with extracted geometric features superimposed on them in the first row. The first picture (from the 
left) was taken under a usual light conditions. The second and third pictures were taken respectively under a greenish and a 
bluish light at a different pose. Identified corresponding positions using our algorithm are also superimposed by large closed 
circles. The figures in the second and third rows show the respective original and normalized distributions of 7 . The 
intermediate results of clustering are shown in the fourth row figures in their normalized coordinate of the geometric 
features. 
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Figure 7: Tests with 7 on Spaghetti-Box pictures 

The surface of this boxes include some colored textures including large/small characters. The pictures in the first row show 
the edges with extracted geometric features superimposed on it. The first picture (from the left) was taken under a usual 
light conditions. The second and third pictures were taken respectively under a greenish and a bluish light at a different pose 
from the first one. The second and third rows show the respective original and normalized distributions of 7 . The identified 
positions are depicted by large closed circles in the figures of the first row. The algorithm could perform identification of the 
corresponding positions fairly accurately as we see in the upper figures. 
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Figure 8 : Tests with 7 on Doll pictures 

The surface of this doll does not have man-made texture like characters, but only has color/brightness variation partly due 
to the changes of materials and partly due to depth variations. The surface is mostly smooth except for some parts including 
hairs, face, and finger parts. The first row shows the edge maps with the extracted geometrical features superimposed on it 
with small closed circles. The first and second pictures (from the left) were taken under a usual light conditions, but at 
different poses of the doll. The third picture was taken under a moderate greenish light plus usual room light. For the fourth 
picture, we used a extremely strong tungsten halogen lamp with a bluish cellophane covering it. The second and the third 
rows show the respective original and normalized distributions of 7 . The identified positions are depicted by large closed 
circles in the figures of the first row. The algorithm could perform identification of the corresponding positions fairly 
accurately as we see in the figures. 
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Figure 9: Tests with p> on Band-Aid-Box pictures 

The pictures in the upper row show the edge maps with extracted geometric features superimposed on them. The first 
picture (from the left) was taken under a usual light conditions. The second and third pictures were taken respectively under 
a greenish and a bluish light at a different pose from the first one. The second row figures show the respective distributions 
of ip. The third row figures show the intermediate results of the clustering. The identified positions are depicted by large 
closed circles in the figures in the first row. 
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Figure 10: Tests with ip on Spaghetti-Box pictures 

The surface of this box include some colored textures including large/small characters. Upper pictures show the edges with 
extracted geometric features superimposed on it. The first picture was taken under a usual light conditions. The second and 
third pictures were taken respectively under a greenish and a bluish light and at a different pose. The lower figures show the 
respective distributions of ip. The identified positions are depicted by large closed circles in the figures of the upper row. The 
algorithm could perform identification of the corresponding positions fairly accurately as we see in the upper figures. 
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Figure 11: Tests with ip on Doll pictures 

The surface of this doll does not have man-made texture like characters, but only has color/brightness variation due to the 
change of material. The surface is mostly smooth except for some parts including hairs, face, and finger parts. The pictures 
in the upper row show the edges with extracted geometric features superimposed on it. jThe first and second pictures were 
taken under a usual light conditions, but at different poses of the doll. The third picture was taken under a moderate 
greenish light and fourth picture^.was taken under an extremely bright bluish light. The lower figures show the respective 
distributions of ip. The identified positions are depicted by large closed circles in the figures of the upper row. The algorithm 
could perform identification of the corresponding positions fairly well as we see in the pictures. 
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